05:06
2026-06-15
lesswrong.com
machine-learning
VFUSE: Virulent Feature Understanding With Sparse AutoEncoders
Researchers introduced VFUSE, a mechanistic interpretability approach using sparse autoencoders to audit protein models for hazardous features. Applied to RoseTTAFold3 and RFDiffusion3, linear probes โฆ